Generating Search Term Variants for Text Collections with Historic Spellings
نویسندگان
چکیده
In this paper, we describe a new approach for retrieval in texts with non-standard spelling, which is important for historic texts in English or German. For this purpose, we present a new algorithm for generating search term variants in ancient orthography. By applying a spell checker on a corpus of historic texts, we generate a list of candidate terms for which the contemporary spellings have to be assigned manually. Then our algorithm produces a set of probabilistic rules. These probabilities can be considered for ranking in the retrieval stage. An experimental comparison shows that our approach outperforms competing methods.
منابع مشابه
Discovery of Term Variation in Japanese Web Search Queries
In this paper we address the problem of identifying a broad range of term variations in Japanese web search queries, where these variations pose a particularly thorny problem due to the multiple character types employed in its writing system. Our method extends the techniques proposed for English spelling correction of web queries to handle a wider range of term variants including spelling mist...
متن کاملRule-based search in historical text databases - Visualization techniques
The project Rule-Based Search in Historical Databases with Non-Standard Spellings (RSNSR, Pilz et al. 2005) will provide an online-available search-engine that can be used by interested amateurs as well as professional linguists. Parallel to the implementation of a customizable software architecture to support an efficient search functionally recalling all relevant historical spellings of a mod...
متن کاملMulti-User File System Search
Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when generating the results to a search query. The goal of this thesis is to close the gap between info...
متن کاملPhonetic Models for Generating Spelling Variants
Proper names, whether English or non-English, have several different spellings when transliterated from a non-English source language into English. Knowing the different variations can significantly improve the results of name-searches on various source texts, especially when recall is important. In this paper we propose two novel phonetic models to generate numerous candidate variant spellings...
متن کاملComparison of LVG and MetaMap Functionality
LVG and MetaMap both compute lexical variants but were developed for quite different purposes: LVG’s raison d’être is lexical variant generation whereas MetaMap’s main purpose is to map text to corresponding concepts in the UMLS® Metathesaurus (Meta), one of the UMLS knowledge sources. Besides generating lexical variants, LVG has the subsumed ability to normalize words and the supplementary abi...
متن کامل